{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## EDA de producción eléctrica y precio en España 2015-2018\n",
"\n",
"#### Autor: Antonio Romero Mtnez-Eiroa"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Introducción\n",
"\n",
"En el presente informe se ofrece un análisis exploratorio de los datos relativos a la producción eléctrica en España y su\n",
"precio entre los años 2015 y 2018.\n",
"\n",
"Estos datos comprenden básicamente información relativa a la cantidad de energía producida atendiendo a los distintos medios de generación, así como la producción total real y la esperada, y el precio real y el esperado. Toda esta información se desglosa por horas.\n",
"\n",
"El objetivo de este análisis en conocer como se relaciona cada medio de generación con la producción total de energía y esta con el precio. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Carga de librerias\n",
"Se cargan las librerias necesarias para realizar el análisis exploratorio."
]
},
{
"cell_type": "code",
"execution_count": 219,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"\n",
"from pandas_profiling import ProfileReport\n",
"\n",
"import seaborn as sns\n",
"import matplotlib as mpl\n",
"import matplotlib.pyplot as plt\n",
"\n",
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Carga de datos\n",
"Se carga el dataframe que va a ser analizado en este informe."
]
},
{
"cell_type": "code",
"execution_count": 220,
"metadata": {},
"outputs": [],
"source": [
"energy = pd.read_csv('../data/energy_dataset.csv', delimiter = ',') "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Tratamiento de datos\n",
"En primer lugar se carga el dataset para tener una idea visual de como se organiza. Puesto que no se muestra al completo, posteriormente se muestra el nombre de todas las variables."
]
},
{
"cell_type": "code",
"execution_count": 221,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" time | \n",
" generation biomass | \n",
" generation fossil brown coal/lignite | \n",
" generation fossil coal-derived gas | \n",
" generation fossil gas | \n",
" generation fossil hard coal | \n",
" generation fossil oil | \n",
" generation fossil oil shale | \n",
" generation fossil peat | \n",
" generation geothermal | \n",
" ... | \n",
" generation waste | \n",
" generation wind offshore | \n",
" generation wind onshore | \n",
" forecast solar day ahead | \n",
" forecast wind offshore eday ahead | \n",
" forecast wind onshore day ahead | \n",
" total load forecast | \n",
" total load actual | \n",
" price day ahead | \n",
" price actual | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 2015-01-01 00:00:00+01:00 | \n",
" 447.0 | \n",
" 329.0 | \n",
" 0.0 | \n",
" 4844.0 | \n",
" 4821.0 | \n",
" 162.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" ... | \n",
" 196.0 | \n",
" 0.0 | \n",
" 6378.0 | \n",
" 17.0 | \n",
" NaN | \n",
" 6436.0 | \n",
" 26118.0 | \n",
" 25385.0 | \n",
" 50.10 | \n",
" 65.41 | \n",
"
\n",
" \n",
" | 1 | \n",
" 2015-01-01 01:00:00+01:00 | \n",
" 449.0 | \n",
" 328.0 | \n",
" 0.0 | \n",
" 5196.0 | \n",
" 4755.0 | \n",
" 158.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" ... | \n",
" 195.0 | \n",
" 0.0 | \n",
" 5890.0 | \n",
" 16.0 | \n",
" NaN | \n",
" 5856.0 | \n",
" 24934.0 | \n",
" 24382.0 | \n",
" 48.10 | \n",
" 64.92 | \n",
"
\n",
" \n",
" | 2 | \n",
" 2015-01-01 02:00:00+01:00 | \n",
" 448.0 | \n",
" 323.0 | \n",
" 0.0 | \n",
" 4857.0 | \n",
" 4581.0 | \n",
" 157.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" ... | \n",
" 196.0 | \n",
" 0.0 | \n",
" 5461.0 | \n",
" 8.0 | \n",
" NaN | \n",
" 5454.0 | \n",
" 23515.0 | \n",
" 22734.0 | \n",
" 47.33 | \n",
" 64.48 | \n",
"
\n",
" \n",
" | 3 | \n",
" 2015-01-01 03:00:00+01:00 | \n",
" 438.0 | \n",
" 254.0 | \n",
" 0.0 | \n",
" 4314.0 | \n",
" 4131.0 | \n",
" 160.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" ... | \n",
" 191.0 | \n",
" 0.0 | \n",
" 5238.0 | \n",
" 2.0 | \n",
" NaN | \n",
" 5151.0 | \n",
" 22642.0 | \n",
" 21286.0 | \n",
" 42.27 | \n",
" 59.32 | \n",
"
\n",
" \n",
" | 4 | \n",
" 2015-01-01 04:00:00+01:00 | \n",
" 428.0 | \n",
" 187.0 | \n",
" 0.0 | \n",
" 4130.0 | \n",
" 3840.0 | \n",
" 156.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" ... | \n",
" 189.0 | \n",
" 0.0 | \n",
" 4935.0 | \n",
" 9.0 | \n",
" NaN | \n",
" 4861.0 | \n",
" 21785.0 | \n",
" 20264.0 | \n",
" 38.41 | \n",
" 56.04 | \n",
"
\n",
" \n",
" | ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" | 35059 | \n",
" 2018-12-31 19:00:00+01:00 | \n",
" 297.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 7634.0 | \n",
" 2628.0 | \n",
" 178.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" ... | \n",
" 277.0 | \n",
" 0.0 | \n",
" 3113.0 | \n",
" 96.0 | \n",
" NaN | \n",
" 3253.0 | \n",
" 30619.0 | \n",
" 30653.0 | \n",
" 68.85 | \n",
" 77.02 | \n",
"
\n",
" \n",
" | 35060 | \n",
" 2018-12-31 20:00:00+01:00 | \n",
" 296.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 7241.0 | \n",
" 2566.0 | \n",
" 174.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" ... | \n",
" 280.0 | \n",
" 0.0 | \n",
" 3288.0 | \n",
" 51.0 | \n",
" NaN | \n",
" 3353.0 | \n",
" 29932.0 | \n",
" 29735.0 | \n",
" 68.40 | \n",
" 76.16 | \n",
"
\n",
" \n",
" | 35061 | \n",
" 2018-12-31 21:00:00+01:00 | \n",
" 292.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 7025.0 | \n",
" 2422.0 | \n",
" 168.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" ... | \n",
" 286.0 | \n",
" 0.0 | \n",
" 3503.0 | \n",
" 36.0 | \n",
" NaN | \n",
" 3404.0 | \n",
" 27903.0 | \n",
" 28071.0 | \n",
" 66.88 | \n",
" 74.30 | \n",
"
\n",
" \n",
" | 35062 | \n",
" 2018-12-31 22:00:00+01:00 | \n",
" 293.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 6562.0 | \n",
" 2293.0 | \n",
" 163.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" ... | \n",
" 287.0 | \n",
" 0.0 | \n",
" 3586.0 | \n",
" 29.0 | \n",
" NaN | \n",
" 3273.0 | \n",
" 25450.0 | \n",
" 25801.0 | \n",
" 63.93 | \n",
" 69.89 | \n",
"
\n",
" \n",
" | 35063 | \n",
" 2018-12-31 23:00:00+01:00 | \n",
" 290.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 6926.0 | \n",
" 2166.0 | \n",
" 163.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" ... | \n",
" 287.0 | \n",
" 0.0 | \n",
" 3651.0 | \n",
" 26.0 | \n",
" NaN | \n",
" 3117.0 | \n",
" 24424.0 | \n",
" 24455.0 | \n",
" 64.27 | \n",
" 69.88 | \n",
"
\n",
" \n",
"
\n",
"
35064 rows × 29 columns
\n",
"
"
],
"text/plain": [
" time generation biomass \\\n",
"0 2015-01-01 00:00:00+01:00 447.0 \n",
"1 2015-01-01 01:00:00+01:00 449.0 \n",
"2 2015-01-01 02:00:00+01:00 448.0 \n",
"3 2015-01-01 03:00:00+01:00 438.0 \n",
"4 2015-01-01 04:00:00+01:00 428.0 \n",
"... ... ... \n",
"35059 2018-12-31 19:00:00+01:00 297.0 \n",
"35060 2018-12-31 20:00:00+01:00 296.0 \n",
"35061 2018-12-31 21:00:00+01:00 292.0 \n",
"35062 2018-12-31 22:00:00+01:00 293.0 \n",
"35063 2018-12-31 23:00:00+01:00 290.0 \n",
"\n",
" generation fossil brown coal/lignite \\\n",
"0 329.0 \n",
"1 328.0 \n",
"2 323.0 \n",
"3 254.0 \n",
"4 187.0 \n",
"... ... \n",
"35059 0.0 \n",
"35060 0.0 \n",
"35061 0.0 \n",
"35062 0.0 \n",
"35063 0.0 \n",
"\n",
" generation fossil coal-derived gas generation fossil gas \\\n",
"0 0.0 4844.0 \n",
"1 0.0 5196.0 \n",
"2 0.0 4857.0 \n",
"3 0.0 4314.0 \n",
"4 0.0 4130.0 \n",
"... ... ... \n",
"35059 0.0 7634.0 \n",
"35060 0.0 7241.0 \n",
"35061 0.0 7025.0 \n",
"35062 0.0 6562.0 \n",
"35063 0.0 6926.0 \n",
"\n",
" generation fossil hard coal generation fossil oil \\\n",
"0 4821.0 162.0 \n",
"1 4755.0 158.0 \n",
"2 4581.0 157.0 \n",
"3 4131.0 160.0 \n",
"4 3840.0 156.0 \n",
"... ... ... \n",
"35059 2628.0 178.0 \n",
"35060 2566.0 174.0 \n",
"35061 2422.0 168.0 \n",
"35062 2293.0 163.0 \n",
"35063 2166.0 163.0 \n",
"\n",
" generation fossil oil shale generation fossil peat \\\n",
"0 0.0 0.0 \n",
"1 0.0 0.0 \n",
"2 0.0 0.0 \n",
"3 0.0 0.0 \n",
"4 0.0 0.0 \n",
"... ... ... \n",
"35059 0.0 0.0 \n",
"35060 0.0 0.0 \n",
"35061 0.0 0.0 \n",
"35062 0.0 0.0 \n",
"35063 0.0 0.0 \n",
"\n",
" generation geothermal ... generation waste generation wind offshore \\\n",
"0 0.0 ... 196.0 0.0 \n",
"1 0.0 ... 195.0 0.0 \n",
"2 0.0 ... 196.0 0.0 \n",
"3 0.0 ... 191.0 0.0 \n",
"4 0.0 ... 189.0 0.0 \n",
"... ... ... ... ... \n",
"35059 0.0 ... 277.0 0.0 \n",
"35060 0.0 ... 280.0 0.0 \n",
"35061 0.0 ... 286.0 0.0 \n",
"35062 0.0 ... 287.0 0.0 \n",
"35063 0.0 ... 287.0 0.0 \n",
"\n",
" generation wind onshore forecast solar day ahead \\\n",
"0 6378.0 17.0 \n",
"1 5890.0 16.0 \n",
"2 5461.0 8.0 \n",
"3 5238.0 2.0 \n",
"4 4935.0 9.0 \n",
"... ... ... \n",
"35059 3113.0 96.0 \n",
"35060 3288.0 51.0 \n",
"35061 3503.0 36.0 \n",
"35062 3586.0 29.0 \n",
"35063 3651.0 26.0 \n",
"\n",
" forecast wind offshore eday ahead forecast wind onshore day ahead \\\n",
"0 NaN 6436.0 \n",
"1 NaN 5856.0 \n",
"2 NaN 5454.0 \n",
"3 NaN 5151.0 \n",
"4 NaN 4861.0 \n",
"... ... ... \n",
"35059 NaN 3253.0 \n",
"35060 NaN 3353.0 \n",
"35061 NaN 3404.0 \n",
"35062 NaN 3273.0 \n",
"35063 NaN 3117.0 \n",
"\n",
" total load forecast total load actual price day ahead price actual \n",
"0 26118.0 25385.0 50.10 65.41 \n",
"1 24934.0 24382.0 48.10 64.92 \n",
"2 23515.0 22734.0 47.33 64.48 \n",
"3 22642.0 21286.0 42.27 59.32 \n",
"4 21785.0 20264.0 38.41 56.04 \n",
"... ... ... ... ... \n",
"35059 30619.0 30653.0 68.85 77.02 \n",
"35060 29932.0 29735.0 68.40 76.16 \n",
"35061 27903.0 28071.0 66.88 74.30 \n",
"35062 25450.0 25801.0 63.93 69.89 \n",
"35063 24424.0 24455.0 64.27 69.88 \n",
"\n",
"[35064 rows x 29 columns]"
]
},
"execution_count": 221,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"energy"
]
},
{
"cell_type": "code",
"execution_count": 222,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Index(['time', 'generation biomass', 'generation fossil brown coal/lignite',\n",
" 'generation fossil coal-derived gas', 'generation fossil gas',\n",
" 'generation fossil hard coal', 'generation fossil oil',\n",
" 'generation fossil oil shale', 'generation fossil peat',\n",
" 'generation geothermal', 'generation hydro pumped storage aggregated',\n",
" 'generation hydro pumped storage consumption',\n",
" 'generation hydro run-of-river and poundage',\n",
" 'generation hydro water reservoir', 'generation marine',\n",
" 'generation nuclear', 'generation other', 'generation other renewable',\n",
" 'generation solar', 'generation waste', 'generation wind offshore',\n",
" 'generation wind onshore', 'forecast solar day ahead',\n",
" 'forecast wind offshore eday ahead', 'forecast wind onshore day ahead',\n",
" 'total load forecast', 'total load actual', 'price day ahead',\n",
" 'price actual'],\n",
" dtype='object')"
]
},
"execution_count": 222,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"energy.columns"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Se cambia el nombre de las variables para hacerlos mas sencillos y manejables."
]
},
{
"cell_type": "code",
"execution_count": 223,
"metadata": {},
"outputs": [],
"source": [
"energy.columns = ['time', 'biomass', 'lignite', 'coal_dev_gas', 'gas', 'hard_coal', 'oil', 'oil_shale', 'peat', \n",
" 'geothermal', 'hydro_pumped_storage_aggregated', 'hydro_pumped_storage_consumption',\n",
" 'hydro_run_river', 'hydro_water_reservoir', 'marine', 'nuclear', 'other',\n",
" 'other_renewable', 'solar', 'waste', 'wind_offshore', 'wind_onshore', 'forecast_solar_day_ahead',\n",
" 'forecast_wind_offshore_day_ahead', 'forecast_wind_onshore_day_ahead',\n",
" 'total_load_forecast', 'total_load_actual', 'price_day_ahead',\n",
" 'price_actual'] "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Se eliminan las variables vacias o que unicamente contienen 0 o están vacias. Para ello se realiza primero un sumatorio por variables y se descartan aquellas que suman 0."
]
},
{
"cell_type": "code",
"execution_count": 224,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"time 2015-01-01 00:00:00+01:002015-01-01 01:00:00+0...\n",
"biomass 1.34402e+07\n",
"lignite 1.57027e+07\n",
"coal_dev_gas 0\n",
"gas 1.97054e+08\n",
"hard_coal 1.49158e+08\n",
"oil 1.04546e+07\n",
"oil_shale 0\n",
"peat 0\n",
"geothermal 0\n",
"hydro_pumped_storage_aggregated 0\n",
"hydro_pumped_storage_consumption 1.66666e+07\n",
"hydro_run_river 3.40678e+07\n",
"hydro_water_reservoir 9.12989e+07\n",
"marine 0\n",
"nuclear 2.19531e+08\n",
"other 2.11077e+06\n",
"other_renewable 3.00133e+06\n",
"solar 5.02092e+07\n",
"waste 9.44295e+06\n",
"wind_offshore 0\n",
"wind_onshore 1.91508e+08\n",
"forecast_solar_day_ahead 5.04594e+07\n",
"forecast_wind_offshore_day_ahead 0\n",
"forecast_wind_onshore_day_ahead 1.91843e+08\n",
"total_load_forecast 1.00676e+09\n",
"total_load_actual 1.0052e+09\n",
"price_day_ahead 1.74879e+06\n",
"price_actual 2.02965e+06\n",
"dtype: object"
]
},
"execution_count": 224,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"energy.sum() "
]
},
{
"cell_type": "code",
"execution_count": 225,
"metadata": {},
"outputs": [],
"source": [
"energy = energy.drop(['coal_dev_gas', 'oil_shale', 'peat', 'geothermal',\n",
" 'hydro_pumped_storage_aggregated', 'marine', 'wind_offshore'], axis=1) "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Se eliminan variables relativas a la previsión de producción eléctrica de energías renovables, ya que no van a ser utilizadas."
]
},
{
"cell_type": "code",
"execution_count": 226,
"metadata": {},
"outputs": [],
"source": [
"energy = energy.drop(['forecast_solar_day_ahead', 'forecast_wind_offshore_day_ahead', 'forecast_wind_onshore_day_ahead'],\n",
" axis=1) "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Puesto que la variable 'time' contiene información tanto de la fechacomo de la hora en una sola columna se divide en dos.\n",
"Para ellos se crea una nueva columna con los datos relativos únicamente a la fecha y otra con las información acerca de la hora. A partir de esta última columna se crea una nueva que contiene solamente la hora y se ignora la huso horario, puesto que todos los datos corresponden a un mismo país. \n",
"\n",
"Al terminar se eliminan tanto la columna 'time' original como la columna intermedia utilizada para la hora y se reordenan las variables para que las nuevas aprezcan al principio del dataset."
]
},
{
"cell_type": "code",
"execution_count": 227,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 2015-01-01 00:00:00+01:00\n",
"1 2015-01-01 01:00:00+01:00\n",
"2 2015-01-01 02:00:00+01:00\n",
"3 2015-01-01 03:00:00+01:00\n",
"4 2015-01-01 04:00:00+01:00\n",
" ... \n",
"35059 2018-12-31 19:00:00+01:00\n",
"35060 2018-12-31 20:00:00+01:00\n",
"35061 2018-12-31 21:00:00+01:00\n",
"35062 2018-12-31 22:00:00+01:00\n",
"35063 2018-12-31 23:00:00+01:00\n",
"Name: time, Length: 35064, dtype: object"
]
},
"execution_count": 227,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"energy['time'] "
]
},
{
"cell_type": "code",
"execution_count": 228,
"metadata": {},
"outputs": [],
"source": [
"energy['date'] = energy.time.str.split(' ').str[0] "
]
},
{
"cell_type": "code",
"execution_count": 229,
"metadata": {},
"outputs": [],
"source": [
"energy['hour1'] = energy.time.str.split(' ').str[1] "
]
},
{
"cell_type": "code",
"execution_count": 230,
"metadata": {},
"outputs": [],
"source": [
"energy['hour'] = energy.hour1.str.split('+').str[0] "
]
},
{
"cell_type": "code",
"execution_count": 231,
"metadata": {},
"outputs": [],
"source": [
"energy = energy.drop(['time', 'hour1'], axis=1) "
]
},
{
"cell_type": "code",
"execution_count": 232,
"metadata": {},
"outputs": [],
"source": [
"energy = energy.reindex(columns = ['date', 'hour', 'biomass', 'lignite', 'gas', 'hard_coal', 'oil',\n",
" 'hydro_pumped_storage_consumption', 'hydro_run_river','hydro_water_reservoir', 'nuclear', 'other',\n",
" 'other_renewable', 'solar', 'waste', 'wind_onshore', 'total_load_forecast',\n",
" 'total_load_actual', 'price_day_ahead', 'price_actual']) "
]
},
{
"cell_type": "code",
"execution_count": 233,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" date | \n",
" hour | \n",
" biomass | \n",
" lignite | \n",
" gas | \n",
" hard_coal | \n",
" oil | \n",
" hydro_pumped_storage_consumption | \n",
" hydro_run_river | \n",
" hydro_water_reservoir | \n",
" nuclear | \n",
" other | \n",
" other_renewable | \n",
" solar | \n",
" waste | \n",
" wind_onshore | \n",
" total_load_forecast | \n",
" total_load_actual | \n",
" price_day_ahead | \n",
" price_actual | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 2015-01-01 | \n",
" 00:00:00 | \n",
" 447.0 | \n",
" 329.0 | \n",
" 4844.0 | \n",
" 4821.0 | \n",
" 162.0 | \n",
" 863.0 | \n",
" 1051.0 | \n",
" 1899.0 | \n",
" 7096.0 | \n",
" 43.0 | \n",
" 73.0 | \n",
" 49.0 | \n",
" 196.0 | \n",
" 6378.0 | \n",
" 26118.0 | \n",
" 25385.0 | \n",
" 50.10 | \n",
" 65.41 | \n",
"
\n",
" \n",
" | 1 | \n",
" 2015-01-01 | \n",
" 01:00:00 | \n",
" 449.0 | \n",
" 328.0 | \n",
" 5196.0 | \n",
" 4755.0 | \n",
" 158.0 | \n",
" 920.0 | \n",
" 1009.0 | \n",
" 1658.0 | \n",
" 7096.0 | \n",
" 43.0 | \n",
" 71.0 | \n",
" 50.0 | \n",
" 195.0 | \n",
" 5890.0 | \n",
" 24934.0 | \n",
" 24382.0 | \n",
" 48.10 | \n",
" 64.92 | \n",
"
\n",
" \n",
" | 2 | \n",
" 2015-01-01 | \n",
" 02:00:00 | \n",
" 448.0 | \n",
" 323.0 | \n",
" 4857.0 | \n",
" 4581.0 | \n",
" 157.0 | \n",
" 1164.0 | \n",
" 973.0 | \n",
" 1371.0 | \n",
" 7099.0 | \n",
" 43.0 | \n",
" 73.0 | \n",
" 50.0 | \n",
" 196.0 | \n",
" 5461.0 | \n",
" 23515.0 | \n",
" 22734.0 | \n",
" 47.33 | \n",
" 64.48 | \n",
"
\n",
" \n",
" | 3 | \n",
" 2015-01-01 | \n",
" 03:00:00 | \n",
" 438.0 | \n",
" 254.0 | \n",
" 4314.0 | \n",
" 4131.0 | \n",
" 160.0 | \n",
" 1503.0 | \n",
" 949.0 | \n",
" 779.0 | \n",
" 7098.0 | \n",
" 43.0 | \n",
" 75.0 | \n",
" 50.0 | \n",
" 191.0 | \n",
" 5238.0 | \n",
" 22642.0 | \n",
" 21286.0 | \n",
" 42.27 | \n",
" 59.32 | \n",
"
\n",
" \n",
" | 4 | \n",
" 2015-01-01 | \n",
" 04:00:00 | \n",
" 428.0 | \n",
" 187.0 | \n",
" 4130.0 | \n",
" 3840.0 | \n",
" 156.0 | \n",
" 1826.0 | \n",
" 953.0 | \n",
" 720.0 | \n",
" 7097.0 | \n",
" 43.0 | \n",
" 74.0 | \n",
" 42.0 | \n",
" 189.0 | \n",
" 4935.0 | \n",
" 21785.0 | \n",
" 20264.0 | \n",
" 38.41 | \n",
" 56.04 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" date hour biomass lignite gas hard_coal oil \\\n",
"0 2015-01-01 00:00:00 447.0 329.0 4844.0 4821.0 162.0 \n",
"1 2015-01-01 01:00:00 449.0 328.0 5196.0 4755.0 158.0 \n",
"2 2015-01-01 02:00:00 448.0 323.0 4857.0 4581.0 157.0 \n",
"3 2015-01-01 03:00:00 438.0 254.0 4314.0 4131.0 160.0 \n",
"4 2015-01-01 04:00:00 428.0 187.0 4130.0 3840.0 156.0 \n",
"\n",
" hydro_pumped_storage_consumption hydro_run_river hydro_water_reservoir \\\n",
"0 863.0 1051.0 1899.0 \n",
"1 920.0 1009.0 1658.0 \n",
"2 1164.0 973.0 1371.0 \n",
"3 1503.0 949.0 779.0 \n",
"4 1826.0 953.0 720.0 \n",
"\n",
" nuclear other other_renewable solar waste wind_onshore \\\n",
"0 7096.0 43.0 73.0 49.0 196.0 6378.0 \n",
"1 7096.0 43.0 71.0 50.0 195.0 5890.0 \n",
"2 7099.0 43.0 73.0 50.0 196.0 5461.0 \n",
"3 7098.0 43.0 75.0 50.0 191.0 5238.0 \n",
"4 7097.0 43.0 74.0 42.0 189.0 4935.0 \n",
"\n",
" total_load_forecast total_load_actual price_day_ahead price_actual \n",
"0 26118.0 25385.0 50.10 65.41 \n",
"1 24934.0 24382.0 48.10 64.92 \n",
"2 23515.0 22734.0 47.33 64.48 \n",
"3 22642.0 21286.0 42.27 59.32 \n",
"4 21785.0 20264.0 38.41 56.04 "
]
},
"execution_count": 233,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"energy.head() "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Se observa la información relativa a las variables con las que se va a desarrollar el análisis."
]
},
{
"cell_type": "code",
"execution_count": 234,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"RangeIndex: 35064 entries, 0 to 35063\n",
"Data columns (total 20 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 date 35064 non-null object \n",
" 1 hour 35064 non-null object \n",
" 2 biomass 35045 non-null float64\n",
" 3 lignite 35046 non-null float64\n",
" 4 gas 35046 non-null float64\n",
" 5 hard_coal 35046 non-null float64\n",
" 6 oil 35045 non-null float64\n",
" 7 hydro_pumped_storage_consumption 35045 non-null float64\n",
" 8 hydro_run_river 35045 non-null float64\n",
" 9 hydro_water_reservoir 35046 non-null float64\n",
" 10 nuclear 35047 non-null float64\n",
" 11 other 35046 non-null float64\n",
" 12 other_renewable 35046 non-null float64\n",
" 13 solar 35046 non-null float64\n",
" 14 waste 35045 non-null float64\n",
" 15 wind_onshore 35046 non-null float64\n",
" 16 total_load_forecast 35064 non-null float64\n",
" 17 total_load_actual 35028 non-null float64\n",
" 18 price_day_ahead 35064 non-null float64\n",
" 19 price_actual 35064 non-null float64\n",
"dtypes: float64(18), object(2)\n",
"memory usage: 5.4+ MB\n"
]
}
],
"source": [
"energy.info() "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Se observa en número de valores nulos, no son significativos."
]
},
{
"cell_type": "code",
"execution_count": 235,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"date 0\n",
"hour 0\n",
"biomass 19\n",
"lignite 18\n",
"gas 18\n",
"hard_coal 18\n",
"oil 19\n",
"hydro_pumped_storage_consumption 19\n",
"hydro_run_river 19\n",
"hydro_water_reservoir 18\n",
"nuclear 17\n",
"other 18\n",
"other_renewable 18\n",
"solar 18\n",
"waste 19\n",
"wind_onshore 18\n",
"total_load_forecast 0\n",
"total_load_actual 36\n",
"price_day_ahead 0\n",
"price_actual 0\n",
"dtype: int64"
]
},
"execution_count": 235,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"energy.isnull().sum(axis=0) "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Definición de las variables:\n",
"- __date__: fecha, en formato AAAA-MM-DD.\n",
"- __hour__: hora, en formato hh:mm:ss.\n",
"- __biomass__: producción electrica generada por biomasa en MW (megavatios). \n",
"- __lignite__: producción eléctrica generada por lignito en MW (megavatios). \n",
"- __gas__: producción eléctrica generada por gas en MW (megavatios).\n",
"- __hard_coal__: producción eléctrica generada por gas en MW (megavatios).\n",
"- __oil__: producción eléctrica generada por petroleo en MW (megavatios).\n",
"- __hydro_pumped_storage_consumption__: producción eléctrica generada por centrales hidroeléctricas reversibles en MW (megavatios).\n",
"- __hydro_run_river__: producción eléctrica generada por centrales hidroeléctricas de pasada en MW (megavatios).\n",
"- __hydro_water_reservoir__: producción eléctrica generada por centrales hidroeléctricas convencionales en MW (megavatios).\n",
"- __nuclear__: producción eléctrica generada por energía nuclear en MW (megavatios). \n",
"- __other__: producción eléctrica generada por otras fuentes de energía en MW (megavatios). \n",
"- __other_renewable__: producción electrica generada por otras fuentes de energía renovable en MW (megavatios).\n",
"- __solar__: producción eléctrica generada por energía solar en MW (megavatios).\n",
"- __waste__: producción eléctrica generada por aprovechamiento de residuos en MW (megavatios). \n",
"- __wind_onshore__: producción eléctrica generada por energía eólica terrestre en MW (megavatios). \n",
"- __total_load_forecast__: demanda eléctrica esperada en MW (megavatios). \n",
"- __total_load_actual__: demanda eléctrica real en MW (megavatios). \n",
"- __price_day_ahead__: precio esperado en EUR/MWh (megavatios por hora). \n",
"- __price_actual__: precio real en EUR/MWh (megavatios por hora). \n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Analisis de datos"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Se presenta una tabla con los estadísticos básicos de las variables."
]
},
{
"cell_type": "code",
"execution_count": 236,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" biomass | \n",
" lignite | \n",
" gas | \n",
" hard_coal | \n",
" oil | \n",
" hydro_pumped_storage_consumption | \n",
" hydro_run_river | \n",
" hydro_water_reservoir | \n",
" nuclear | \n",
" other | \n",
" other_renewable | \n",
" solar | \n",
" waste | \n",
" wind_onshore | \n",
" total_load_forecast | \n",
" total_load_actual | \n",
" price_day_ahead | \n",
" price_actual | \n",
"
\n",
" \n",
" \n",
" \n",
" | count | \n",
" 35045.000000 | \n",
" 35046.000000 | \n",
" 35046.000000 | \n",
" 35046.000000 | \n",
" 35045.000000 | \n",
" 35045.000000 | \n",
" 35045.000000 | \n",
" 35046.000000 | \n",
" 35047.000000 | \n",
" 35046.000000 | \n",
" 35046.000000 | \n",
" 35046.000000 | \n",
" 35045.000000 | \n",
" 35046.000000 | \n",
" 35064.000000 | \n",
" 35028.000000 | \n",
" 35064.000000 | \n",
" 35064.000000 | \n",
"
\n",
" \n",
" | mean | \n",
" 383.513540 | \n",
" 448.059208 | \n",
" 5622.737488 | \n",
" 4256.065742 | \n",
" 298.319789 | \n",
" 475.577343 | \n",
" 972.116108 | \n",
" 2605.114735 | \n",
" 6263.907039 | \n",
" 60.228585 | \n",
" 85.639702 | \n",
" 1432.665925 | \n",
" 269.452133 | \n",
" 5464.479769 | \n",
" 28712.129962 | \n",
" 28696.939905 | \n",
" 49.874341 | \n",
" 57.884023 | \n",
"
\n",
" \n",
" | std | \n",
" 85.353943 | \n",
" 354.568590 | \n",
" 2201.830478 | \n",
" 1961.601013 | \n",
" 52.520673 | \n",
" 792.406614 | \n",
" 400.777536 | \n",
" 1835.199745 | \n",
" 839.667958 | \n",
" 20.238381 | \n",
" 14.077554 | \n",
" 1680.119887 | \n",
" 50.195536 | \n",
" 3213.691587 | \n",
" 4594.100854 | \n",
" 4574.987950 | \n",
" 14.618900 | \n",
" 14.204083 | \n",
"
\n",
" \n",
" | min | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 18105.000000 | \n",
" 18041.000000 | \n",
" 2.060000 | \n",
" 9.330000 | \n",
"
\n",
" \n",
" | 25% | \n",
" 333.000000 | \n",
" 0.000000 | \n",
" 4126.000000 | \n",
" 2527.000000 | \n",
" 263.000000 | \n",
" 0.000000 | \n",
" 637.000000 | \n",
" 1077.250000 | \n",
" 5760.000000 | \n",
" 53.000000 | \n",
" 73.000000 | \n",
" 71.000000 | \n",
" 240.000000 | \n",
" 2933.000000 | \n",
" 24793.750000 | \n",
" 24807.750000 | \n",
" 41.490000 | \n",
" 49.347500 | \n",
"
\n",
" \n",
" | 50% | \n",
" 367.000000 | \n",
" 509.000000 | \n",
" 4969.000000 | \n",
" 4474.000000 | \n",
" 300.000000 | \n",
" 68.000000 | \n",
" 906.000000 | \n",
" 2164.000000 | \n",
" 6566.000000 | \n",
" 57.000000 | \n",
" 88.000000 | \n",
" 616.000000 | \n",
" 279.000000 | \n",
" 4849.000000 | \n",
" 28906.000000 | \n",
" 28901.000000 | \n",
" 50.520000 | \n",
" 58.020000 | \n",
"
\n",
" \n",
" | 75% | \n",
" 433.000000 | \n",
" 757.000000 | \n",
" 6429.000000 | \n",
" 5838.750000 | \n",
" 330.000000 | \n",
" 616.000000 | \n",
" 1250.000000 | \n",
" 3757.000000 | \n",
" 7025.000000 | \n",
" 80.000000 | \n",
" 97.000000 | \n",
" 2578.000000 | \n",
" 310.000000 | \n",
" 7398.000000 | \n",
" 32263.250000 | \n",
" 32192.000000 | \n",
" 60.530000 | \n",
" 68.010000 | \n",
"
\n",
" \n",
" | max | \n",
" 592.000000 | \n",
" 999.000000 | \n",
" 20034.000000 | \n",
" 8359.000000 | \n",
" 449.000000 | \n",
" 4523.000000 | \n",
" 2000.000000 | \n",
" 9728.000000 | \n",
" 7117.000000 | \n",
" 106.000000 | \n",
" 119.000000 | \n",
" 5792.000000 | \n",
" 357.000000 | \n",
" 17436.000000 | \n",
" 41390.000000 | \n",
" 41015.000000 | \n",
" 101.990000 | \n",
" 116.800000 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" biomass lignite gas hard_coal oil \\\n",
"count 35045.000000 35046.000000 35046.000000 35046.000000 35045.000000 \n",
"mean 383.513540 448.059208 5622.737488 4256.065742 298.319789 \n",
"std 85.353943 354.568590 2201.830478 1961.601013 52.520673 \n",
"min 0.000000 0.000000 0.000000 0.000000 0.000000 \n",
"25% 333.000000 0.000000 4126.000000 2527.000000 263.000000 \n",
"50% 367.000000 509.000000 4969.000000 4474.000000 300.000000 \n",
"75% 433.000000 757.000000 6429.000000 5838.750000 330.000000 \n",
"max 592.000000 999.000000 20034.000000 8359.000000 449.000000 \n",
"\n",
" hydro_pumped_storage_consumption hydro_run_river \\\n",
"count 35045.000000 35045.000000 \n",
"mean 475.577343 972.116108 \n",
"std 792.406614 400.777536 \n",
"min 0.000000 0.000000 \n",
"25% 0.000000 637.000000 \n",
"50% 68.000000 906.000000 \n",
"75% 616.000000 1250.000000 \n",
"max 4523.000000 2000.000000 \n",
"\n",
" hydro_water_reservoir nuclear other other_renewable \\\n",
"count 35046.000000 35047.000000 35046.000000 35046.000000 \n",
"mean 2605.114735 6263.907039 60.228585 85.639702 \n",
"std 1835.199745 839.667958 20.238381 14.077554 \n",
"min 0.000000 0.000000 0.000000 0.000000 \n",
"25% 1077.250000 5760.000000 53.000000 73.000000 \n",
"50% 2164.000000 6566.000000 57.000000 88.000000 \n",
"75% 3757.000000 7025.000000 80.000000 97.000000 \n",
"max 9728.000000 7117.000000 106.000000 119.000000 \n",
"\n",
" solar waste wind_onshore total_load_forecast \\\n",
"count 35046.000000 35045.000000 35046.000000 35064.000000 \n",
"mean 1432.665925 269.452133 5464.479769 28712.129962 \n",
"std 1680.119887 50.195536 3213.691587 4594.100854 \n",
"min 0.000000 0.000000 0.000000 18105.000000 \n",
"25% 71.000000 240.000000 2933.000000 24793.750000 \n",
"50% 616.000000 279.000000 4849.000000 28906.000000 \n",
"75% 2578.000000 310.000000 7398.000000 32263.250000 \n",
"max 5792.000000 357.000000 17436.000000 41390.000000 \n",
"\n",
" total_load_actual price_day_ahead price_actual \n",
"count 35028.000000 35064.000000 35064.000000 \n",
"mean 28696.939905 49.874341 57.884023 \n",
"std 4574.987950 14.618900 14.204083 \n",
"min 18041.000000 2.060000 9.330000 \n",
"25% 24807.750000 41.490000 49.347500 \n",
"50% 28901.000000 50.520000 58.020000 \n",
"75% 32192.000000 60.530000 68.010000 \n",
"max 41015.000000 101.990000 116.800000 "
]
},
"execution_count": 236,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"energy.describe() "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Se realiza un breve resumen de un análisis de datos para tener una idea inicial de las variables."
]
},
{
"cell_type": "code",
"execution_count": 237,
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "7427d8b7866c4d628542767da8683559",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(HTML(value='Summarize dataset'), FloatProgress(value=0.0, max=34.0), HTML(value='')))"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "35c4b286ac0d48e2b705069fc7c69867",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(HTML(value='Generate report structure'), FloatProgress(value=0.0, max=1.0), HTML(value='')))"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "9816589f959c4468aa21ac97a7e34d64",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(HTML(value='Render HTML'), FloatProgress(value=0.0, max=1.0), HTML(value='')))"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n"
]
},
{
"data": {
"text/html": [
"